* =============================================================================
* File Name: 2016_EC_EnvRegulation--MergeCMAAirQuality_useASMCMA.do

* File Description: This file merges the CMA-year air quality data to the micro
* data. The input file is "CMA_PM25andO3.dta". 

* The merge is performed on CMA and year. The micro data contains two CMA variables
* -- one from the ASM and one form the NPRI. The ASM variable is to maximize coverage. 

* The files main output is the dta file "npri_asm_envreg_ASMAirQuality.dta".

* Creation date: May 12, 2016

* This version: March 6, 2019

* Author: Nouri Najjar
* =============================================================================

* -----------------------------------------------------------------------------
* Section 1: Call the air quality data ("CMA_PM25andO3.dta") from the cder 
* server. 

* The dataset called for this part is the CMA-year level air quality data 
* constructed by the authors. This data was created by aggregating the individual 
* air quality monitoring data from the NAPS dataset to the CMA-year level. 
* -----------------------------------------------------------------------------
* Set server folder. 
local datadir \\f4cder01\2016_EC_EnvRegulation\DATA\

* Open dataset.
use "`datadir'CMA_PM25andO3.dta", clear
* -----------------------------------------------------------------------------


* -----------------------------------------------------------------------------
* Section 2: Prepare the air quality data for merging. 
* -----------------------------------------------------------------------------
* Create new cma identifier variable to use in merge with npri-asm. 
gen CMA_ID_Merge = float(CMAUID)
* Create new year variable to use in merge. 
gen Year_Merge = REP_PERIOD
* Drop missing years. 
drop if missing(REP_PERIOD)==1
* Sort the air quality data on cma and year. 
sort CMA_ID_Merge Year_Merge
* Save for merge
save "`datadir'CMA_PM25andO3_merge.dta", replace
* -----------------------------------------------------------------------------

* Loop over each pollutant
foreach x in PM25 {

* -----------------------------------------------------------------------------
* Section 3: Call the full npri-asm data ("asm_npri_2002to2012.dta") from the cder 
* server.  
* -----------------------------------------------------------------------------
* Open dataset.
use "`datadir'asm_npri_2002to2012`x'.dta", clear
* -----------------------------------------------------------------------------


* -----------------------------------------------------------------------------
* Section 4: Prepare the npri-asm data for merging with air quality.
* -----------------------------------------------------------------------------
* Create new year variable of size float. 
gen Year_Merge = float(yr4)

***
* Set the cma variable to use in merger. 
***
* Use the NPRI CMA variable.  
*gen CMA_ID_Merge =  real(cmauid) 

* Use the ASM CMAvariable
* Construct three digit cmaca2006 variable. To do this the first 2 digits of the
* 5-digit Province-CMA identifier need to be removed. To do this, reverse the 
* string, convert to a number and trim end two digits. Then turn the new 3 digit
* number into a string, reverse it again, and convert it back into a number. 
gen cmaca2006_3digstore = int(real(reverse(cmaca2006))*0.01) 
tostring cmaca2006_3digstore, replace
gen lengthvalue = length(cmaca2006_3digstore)
gen cmaca2006_3dig = real(reverse(cmaca2006_3digstore))
* Add zeros to end for two digit values. 
gen CMA_ID_Merge = cond(lengthvalue==2,cmaca2006_3dig*10,cmaca2006_3dig)  

drop cmaca2006_3digstore lengthvalue

* Sort the npri-asm data on cma and year. 
sort CMA_ID_Merge Year_Merge

* Save
save "`datadir'asm_npri_2002to2012`x'_withAQ.dta", replace
* -----------------------------------------------------------------------------


* -----------------------------------------------------------------------------
* Section 5: Merge in the air quality data on CMA and year

* The using dataset is the air quality data saved in the project folder. 
* -----------------------------------------------------------------------------
* Use many to one matching because each CMA-Year contains many plants.
merge m:1 CMA_ID_Merge Year_Merge using "`datadir'CMA_PM25andO3_merge.dta", gen(_mergeAQ)

* Check merger
tab _mergeAQ

* Drop CMA-Years without any plants
drop if _mergeAQ==2
*drop _merge

summarize O3_CMA
summarize PM25_CMA

* Save merged data
save "`datadir'asm_npri_2002to2012`x'_withAQ.dta", replace
* -----------------------------------------------------------------------------

}


